RBiotools is a package for COMPARATIVE MICROBIAL GENOMICS. RBiotools works best in the RStudio integrated development environment. RBiotools works with IDs from GenBank, the NIH genetic sequence data base, to perform analyses and create figures.
Before installation,
1. install and update dependencies
install.packages(c("rlang","ape", "fmsb", "gplots", "grImport", "gridExtra", "msa", "pheatmap", "RCurl", "rentrez", "seqinr"), repos =" http://cran.us.r-project.org ")
install.packages(c("lattice","mgcv","nlme","survival"), repos =" http://cran.us.r-project.org ")
#Install msa and Biostrings package via BiocManager
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager", repos =" http://cran.us.r-project.org ")
BiocManager::install("msa")
BiocManager::install("Biostrings", force = TRUE)
library(msa) ## this library call is required so the dendrogram will work
install.packages("installr", repos =" http://cran.us.r-project.org ")
library(installr) ## this library call is required so we can install RBiotools from the zip source
2. install the RBiotools package from zip source; and load the RBiotools library
install.packages("C:/dc3/RBiotools_0.5.0.tar.gz", repos = NULL, type = "source")
library(RBiotools) ## this library call is required because it's the one we are primarily using
## Welcome to RBiotools!
## © 2021, Michael R. Leuze <RBiotools@gmail.com>
## Do not distribute without permission.
3. Setup the environment and working directory
Choose a directory / folder that will be used for some of the figures, including Heat Maps and Genome Atlases:
#this line checks to see if the folder exists; if it does not then it creates the folder; if it does then it returns "Folder exists already"
ifelse(!dir.exists(".//r_repository//"), dir.create(".//r_repository//"), "Folder exists already")
setwd("C:/dc3/R_repository")
The first step is to choose a set of organisms that you would like to explore. What is an organism?
Let’s use a sample sets of organisms to get started. Specifically, a list of E. coli and Shigella organisms
eColi <- c(
"AP012306", # Escherichia coli str. K-12 substr. MDS42 DNA 3.976 Mb - smallest chromosome
"KK583188", # Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 4.907 Mb - type strain scaffold
"U00096", # Escherichia coli str. K-12 substr. MG1655 4.642 Mb - first E. coli genome sequenced
"CP000802", # Escherichia coli HS 4.644 Mb - group A representative, commensal
"CP000800", # Escherichia coli E24377A 4.980 Mb - group B1 representative
"AP009378", # Escherichia coli SE15 4.717 Mb - group B2 representative, commensal
"FM180568", # Escherichia coli 0127:H6 E2348/69 4.966 Mb - group B2 representative, enteropathogenic
"CU928163", # Escherichia coli UMN026 5.202 Mb - group D representative
"CP008957", # Escherichia coli O157:H7 str. EDL933 5.547 Mp - group E representative
"CP027027", # Shigella dysenteriae strain E670/74 5.037 Mb - Shigella dysenteria representative
"CP026802", # Shigella sonnei strain ATCC 29930 4.975 Mb - Shigella sonnei representative
"CP026877", # Shigella boydii strain ATCC 35964 5.129 Mb - Shigella boydii representative
"CP026793", # Shigella flexneri strain 74-1170 4.734 Mb - Shigella flexneri representative
"CP015831" # Escherichia coli O157 strain 644-PT8 5.831 Mb - largest chromosome
)
and here we have a list of organisms that are not as closely related from the Proteobacteria classes.
proteobacteria <- c(
"CP018228", # Rhizobium leguminosarum strain Vaf-108 Phylum: Proteobacteria (alpha)
"CP017405", # Bordetella pertussis strain J448 Phylum: Proteobacteria (beta)
"CP008957", # Escherichia coli O157:H7 str. EDL933 Phylum: Proteobacteria (gamma)
"HG530068", # Pseudomonas aeruginosa PA38182 Phylum: Proteobacteria (gamma)
"CP002031", # Geobacter sulfurreducens KN400 Phylum: Proteobacteria (delta)
"CP002332" # Helicobacter pylori Gambia94/24 Phylum: Proteobacteria (epsilon)
)
Plot a dendrogram using 16S rRNA genes
dendrogram16S(eColi)
dendrogram16S(proteobacteria)
createAtlas(eColi[1]) # here we index into the list and grab the first item; AP012306 or E.coli str. K-12 substr. MDS42 DNA
rsvg::rsvg_png('C:/dc3/R_repository/GenomeAtlas_AP012306.svg', 'C:/dc3/R_repository/GenomeAtlas_AP012306.png', width = 800)
Note: To view this Genome Atlas, which is stored as an SVG file
* Open a browser ... Safari, Chrome, Firefox, ...
* Click "File" in the menu bar and choose "Open File ..." from the pull-down menu
* Navigate to your Working Directory
* Double-click on "GenomeAtlas_AP012306.svg"
Note: The plotUsage function has many options. See the plotUsage documentation.
plotUsage(eColi[1])
Note: Heat Maps work best for groups of organisms that are NOT closely related (like our Proteobacteria group?)
plotHeatMapCodon(proteobacteria)
Note: The heat map is now in your Working Directory.
* Click on the "Files" tab in the lower left quadrant
* Navigate to your Working Directory (hint: you may need to click on the name of your Working Directory to update it)
* Click on the "HeatMapCodon.png" file
Note: Blast Matrices work best for groups of closely related organisms (like our Proteobacteria group?)
proteinGrouping <- runLinclust(proteobacteria)
plotBlastMatrix(proteinGrouping)
Note: To view this Homology Matrix, which is stored as an SVG file
* Open a browser ... Safari, Chrome, Firefox, ...
* Click "File" in the menu bar and choose "Open File ..." from the pull-down menu
* Navigate to your Working Directory
* Double-click on "BlastMatrix.svg"